Are the discretised lognormal and hooked power law distributions plausible for citation data?
نویسنده
چکیده
There is no agreement over which statistical distribution is most appropriate for modelling citation count data. This is important because if one distribution is accepted then the relative merits of different citation-based indicators, such as percentiles, arithmetic means and geometric means, can be more fully assessed. In response, this article investigates the plausibility of the discretised lognormal and hooked power law distributions for modelling the full range of citation counts, with an offset of 1. The citation counts from 23 Scopus subcategories were fitted to hooked power law and discretised lognormal distributions but both distributions failed a Kolmogorov-Smirnov goodness of fit test in over three quarters of cases. The discretised lognormal distribution also seems to have the wrong shape for citation distributions, with too few zeros and not enough medium values for all subjects. The cause of poor fits could be the impurity of the subject subcategories or the presence of interdisciplinary research. Although it is possible to test for subject subcategory purity indirectly through a goodness of fit test in theory with large enough sample sizes, it is probably not possible in practice. Hence it seems difficult to get conclusive evidence about the theoretically most appropriate statistical distribution.
منابع مشابه
Are there too many uncited articles? Zero inflated variants of the discretised lognormal and hooked power law distributions
Although statistical models fit many citation data sets reasonably well with the best fitting models being the hooked power law and discretised lognormal distribution, the fits are rarely close. One possible reason is that there might be more uncited articles than would be predicted by any model if some articles are inherently uncitable. Using data from 23 different Scopus categories, this arti...
متن کاملThe discretised lognormal and hooked power law distributions for complete citation data: Best options for modelling and regression
Identifying the statistical distribution that best fits citation data is important to allow robust and powerful quantitative analyses. Whilst previous studies have suggested that both the hooked power law and discretised lognormal distributions fit better than the power law and negative binomial distributions, no comparisons so far have covered all articles within a discipline, including those ...
متن کاملCitation count distributions for large monodisciplinary journals
Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK. Many different citation-based indicators are used by researchers and research evaluators to help evaluate the impact of scholarly outputs. Although the appropriateness of individual citation indicators depends in part on the statistical properties of citation counts, there is no universally agreed best-fitt...
متن کاملDistributions for cited articles from individual subjects and years
The citations to a set of academic articles are typically unevenly shared, with many articles attracting few citations and few attracting many. It is important to know more precisely how citations are distributed in order to help statistical analyses of citations, especially for sets of articles from a single discipline and a small range of years, as normally used for research evaluation. This ...
متن کاملStopped Sum Models for Citation Data
It is important to identify the most appropriate statistical model for citation data in order to maximise the power of future analyses as well as to shed light on the processes that drive citations. This article assesses stopped sum models and compares them with two previously used models, the discretised lognormal and negative binomial distributions using the Akaike Information Criterion (AIC)...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Informetrics
دوره 10 شماره
صفحات -
تاریخ انتشار 2016